CLUSTERING NEWS ARTICLES USING K-MEANS AND N-GRAMS

  • Type: Project
  • Department: Computer Science
  • Project ID: CPU1848
  • Access Fee: ₦5,000 ($14)
  • Pages: 77 Pages
  • Format: Microsoft Word
  • Views: 673
  • Report This work

For more Info, call us on
+234 8130 686 500
or
+234 8093 423 853

ABSTRACT

Document clustering is an automatic unsupervised machine learning technique that aimed at grouping related set of items into clusters or subsets. The target is to create clusters with high internal coherence, but different from each other substantially. Simply, items within the same cluster should be highly similar, while maintaining high dissimilarity with items within other clusters. Automatic clustering of documents has played a very significant role in many fields including data mining and information retrieval. This thesis aimed to improve the overall efficiency of a document clustering technique using N-grams and efficient similarity measure. The thesis improves the purity and accuracy of the obtained clusters. The preprocessing method is based on N-grams (sequence of N consecutive characters) which do not give consideration to stop-words or other special punctuations but creates and overlap among the content of a document which further gives room to ignore errors thereby increasing the quality of the clusters to a great extent. This approach clusters the news articles based on their N-grams representation, thereby reducing noise and increase the probability of occurrences of the sequences within the articles document. The proposed clustering technique has parameters which can be changed accordingly at the document representation level in order to improve the efficiency and quality of the generated clusters. The results from the experiment using R programming environment were carried out on real datasets of the Reuters21578 and 20Newsgropus proved the effectiveness of the proposed clustering technique at different levels of N-grams in terms of the accuracy and purity of the generated clusters. The results also showed that the proposed clustering technique perform averagely better than the baseline technique both in terms of accuracy and purity with a best results when the window of N-grams = 3.

CLUSTERING NEWS ARTICLES USING K-MEANS AND N-GRAMS
For more Info, call us on
+234 8130 686 500
or
+234 8093 423 853

Share This
  • Type: Project
  • Department: Computer Science
  • Project ID: CPU1848
  • Access Fee: ₦5,000 ($14)
  • Pages: 77 Pages
  • Format: Microsoft Word
  • Views: 673
Payment Instruction
Bank payment for Nigerians, Make a payment of ₦ 5,000 to

Bank GTBANK
gtbank
Account Name Obiaks Business Venture
Account Number 0211074565

Bitcoin: Make a payment of 0.0005 to

Bitcoin(Btc)

btc wallet
Copy to clipboard Copy text

Details

Type Project
Department Computer Science
Project ID CPU1848
Fee ₦5,000 ($14)
No of Pages 77 Pages
Format Microsoft Word

Related Works

ABSTRACT Document clustering is an automatic unsupervised machine learning technique that aimed at grouping related set of items into clusters or subsets. The target is to create clusters with high internal coherence, but different from each other substantially. Simply, items within the same cluster should be highly similar, while maintaining high... Continue Reading
ABSTRACT This research work is in respect to the development of an Information Retrieval System for the Federal Road Safety Corps in Benue State. An Information Retrieval System is a system that is capable of storage, retrieval and maintenance of information, the general objective of this   System is to minimize the overhead of a user locating... Continue Reading
ABSTRACT In this project, we shall implement the hierarchical clustering algorithm and apply it to various data sets such as the weather data set, the student data set, and the patient data set. We shall then reduce these datasets using the following dimensionality reduction approaches: Random Projections (RP), Principal Component Analysis (PCA),... Continue Reading
ABSTRACT This research work is in respect to the development of an Information Retrieval System for the Federal Road Safety Corps in Benue State. An Information Retrieval System is a system that is capable of storage, retrieval and maintenance of information, the general objective of this   System is to minimize the overhead of a user locating... Continue Reading
ABSTRACT The English article system is a complex system that is nevertheless indispensable for users of English. The articles perform several important functions and aid greatly in the art of communication. Despite this indisputable importance of the English articles, Nigerian... Continue Reading
ABSTRACT  The English article system is a complex system that is nevertheless indispensable for users of English. The articles perform several important functions and aid greatly in the art of communication. Despite this indisputable importance of the English articles, Nigerian newspapers misuse them to a great degree. It is a fact that... Continue Reading
Arrière Plan La grammaire est très importance dans l’apprentissage d’une langue. Selon Quillet et Flammarion, elle est la «  science des règles des faits du langage ou d’une langue.  » (688). Pour Pratt et Behly-Quenum, La grammaire permet : « à l’apprenant d’une langue de connaitre, voire maitriser la morphologie, la syntaxe... Continue Reading
Arrière Plan La grammaire est très importance dans l’apprentissage d’une langue. Selon Quillet et Flammarion, elle est la «  science des règles des faits du langage ou d’une langue.  » (688). Pour Pratt et Behly-Quenum, La grammaire permet : « à l’apprenant d’une langue de connaitre, voire maitriser la morphologie, la syntaxe... Continue Reading
Intermedia Agenda setting – Instances when the media agenda is shaped by other media  (Lopez-Escobar, Llamas, McCombs, & Lennon, 1998).  #KOT - This is an abbreviation of ‗Kenyans on Twitter‘. Operates as Kenya‘s virtual ‗town  square‘ where pre-dominantly on Twitter, people meet to discuss the day‘s issues. Through  the hashtag... Continue Reading
ABSTRACT This study compares how community and national newspapers disseminate local news. This study is meant to know between the two categories of newspaper which pays more attention and gives more prominence to local news dissemination. The purpose of this work therefore is to evaluate the roles newspaper play in fostering community and... Continue Reading
Call Us
whatsappWhatsApp Us